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Hamiltonian Monte Carlo provides efficient Markov transitions at the expense of introducing two 
free parameters: a step size and total integration time. Because the step size controls discretization 
error it can be readily tuned to achieve certain accuracy criteria, but the total integration time is 
left unconstrained. Recently Hoffman and Gelman proposed a criterion for tuning the integration 
time in certain systems with their No U-Turn Sampler, or NUTS. In this paper I investigate the 
dynamical basis for the success of NUTS and generalize it to Riemannian Manifold Hamiltonian 
Monte Carlo. 
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Taking advantage of a natural mapping between forms 
on a manifold and measures on a probability space, 
Hamiltonian Monte Carlo (HMC) generates Metropolis 
proposals by simulating Hamiltonian flow. The autocor- 
relation of the resulting Markov chain depends on how 
long the flow is integrated, but there are no conditions on 
the optimal integration time that minimizes autocorrela- 
tion for a general target distribution. When the target 
distribution is unimodal, however, there is a natural cri- 
terion: turning points of the resultant trajectories. The 
No U-Turn Sampler identifies these turning points for Eu- 
clidean manifolds, but the criterion begins to fail when 
applied to more complex distributions and Riemannian 
manifolds. Appealing to the geometry of HMC, however, 
admits a straightforward generalization of the No U-Turn 
Sampler that is not only amenable to Riemannain mani- 
folds but also isolates the turning points in more compli- 
cated, non-convex target distributions. 



HAMILTONIAN MONTE CARLO 

Given a target density 7r(q), Hamiltonian Monte 
Carlo [l|-y| yields Metropolis proposals for the extended 
density, 

7r(p,q) =cxp[-i7(p,q)], 

where the Hamiltonian, iJ(p,q), is defined as 

ff(p,q)=r(p,q) + l/(q), 

with the potential energy, V{q) defined by the target 
density, 

F(q) = -log7r(q). 

Up to a few weak constraints, the kinetic energy, T(p, q) 
can to tuned to improve the proposals. 

In particular, HMC defines the potential as a func- 
tion on some manifold A4 with coordinate functions q, 
and the Hamiltonian as scalar function on the cotangent 
bundle, T*A4, with coordinate functions {p,q}. At a 



point S G T*A4 the Hamiltonian is then more formally 
written as 

H{S)^TiS) + ViR), 

where R = Tr{S) G A^. 

Given the natural symplectic geometry of T*Ai, the 
Hamiltonian defines a vector field, and integrating along 
the vector field for some time t generates a flow that maps 
any point, Sq G T*M, to some final point, St S T*M, 
that serves as the proposal. 

Integrating for only a short time yields highly corre- 
lated points and random walk behavior, but trajecto- 
ries that are integrated for too long may double back on 
themselves and waste computational resources. In order 
to optimize the proposal, we need to determine exactly 
how long to evolve the trajectories in order to maximize 
the distance between proposals and minimize the auto- 
correlation of the chain. 



HARMONIC MOTION 

In many applications, a Markov chain is meant to sam- 
ple from a single mode of the target distribution, which 
manifests as a well in the potential energy. For small 
enough values of Hamiltonian, the trajectories become 
bound in the neighborhood of the mode and execute har- 
monic motion. 

For example, consider Euclidean Manifold Hamiltonian 
Monte Carlo (EMHMC), 
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with a quadratic potential, 



jk- 



jk 



" |betanalpha@gmaiI.com| 



^ Here tt is the natural projection operator vr : T'M — ^ M on 
the cotangent bundle, not the notation for probability density 
used above. A point _R S A4 is identified by the same position 
coordinate functions q in T'M, so that q(-R) = fl{S). The value 
of the momentum coordinate functions identify a unique covcctor 
on the base manifold, p{-R) £ T^Ai. 



The trajectories from this Hamiltonian execute simple 



harmonic motion (J| along the the directions Nj ~ v M • 
lij, where the fij are the eigenvectors of vM^ • W • 
vM^- . In terms of the coordinate functionto, 

q(^0 = A ^ exp {i {ujjt + (bj + 7r/2)] N^' (1) 

j 
p(5t) = A ^ exp [i {ujjt + (j)j)] Nj . (2) 

3 

with the frequencies given by the respective eigenvalues, 
uij = ^/Xj, and the amplitude, A, and phases, (j)j, deter- 
mined by the initial conditions. 

The motion along each direction Nj turns back on it- 
self after reaching a turning point, where the projection of 
the momentum vanishes in the process of changing sign. 
Because the extent of the trajectory is limited by the os- 
cillation with the longest period, Tj = 27r/a;j, maximiz- 
ing distance is equivalent to integrating until reaching 
the turning point of the longest oscillation, or TPOLO 
for succintness. For a more general Hamiltonian the mo- 
tion will be more complex but still harmonic, and the 
TPOLO still determines the optimal integration time. 

Sampling from a Gaussian distribution A/'(0, S) with 



S = 



1 P 
P 1 



and p = 0.95, for example, yields the potential 

The resulting Hamiltonian dynamics execute simple har- 
monic motion, with the bulk of the motion turning back 
on itself at the TPOLOs (Figure [J). 

Identifying the slowest turning point during a trajec- 
tory, however, is not an easy task. 



THE NO U-TURN SAMPLER 

In HMC, naively terminating a trajectory based on 
some dynamic criterion sacrifices detailed balance and 
threatens convergence to the target distribution. The 
No U-Turn Sampler, or NUTS, admits a dynamic termi- 
nation criterion that preserves detailed balance by using 
Hamiltonian dynamics to build a binary tree from which 
the transition is sampled [5|. 



The termination criterion itself attempts to identify 
the longest turning point by taking advantage of the ge- 
ometry of EMHMC. Here the base manifold, Ai, is en- 
dowed with a Euclidean geometry where the distance be- 
tween two points, r{Ri,R2) = q(^i) — q(^2), is a well- 
defined vector. Because the momentum is tangential to 
the trajectory when M = I, the contraction of the dis- 
tance onto the current momentum vanishef|j. 



exactly when the trajectory should have started to turn 
back on itself, and consequently provides a valid termi- 
nation criterion. 

In practice, the success of the NUTS criterion is not 
limited to the case of a unit mass matrix. Consider the 
expansion of the distance as an integral over the trajec- 
tory, 

r(i?t,i?o)=q(i?t)-q»(i?o) 

= / dTM-^p{Rr) 
= M-^ f dTp{Rr) 

Jq 
= tM-'p{Rt), 



where 



1 /■* 

piRt)=- / dTp{Rr). 
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In terms of p{Rt), the NUTS criterion becomes 
Y,P3iRt)r'{Rt,Ro) = 



tJ2p,{Rt)pk{Rt){M-Y^0 



jk 



Y,p,iRt)pk{Rt){M-Y^O 



jk 

where the inner product on Ai is defined by 

jk 



^ The following uses complex exponential notation with an im- 
plicit projection to the real axis assumed. For example, exp [iu)t\ 



formally represents TZ. (exp [iujt] ) 
cos (u)i) . 



V, (cos {uit) + i sin {uit) ) 



^ Here we're taking advantage of the fact that a point S G T* M 
identifies both a point, R = it{S) £ M and a covector, p(-R) G 

M. 
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FIG. 1. Given the quadratic potential ([3]), the Hamiltonian trajectories execute simple harmonic motion with (a) eigenfrequen- 
cies cj = (1 ± p)^^ for M = I and (b) degenerate eigenfrequencies lo = \ for M = S~^, with phases and amplitude determined 
by the initial position and momentum. The trajectories turn back on themselves at the turning points of the longest oscillation, 
which align with the semi-major axis of the potential. 



Now, for simple harmonic m.otion ([2]) , 

1 /■* 

p{Rt) = - / dr A ^ exp [i {uj,t -f 0^ )] Nj 

t Jo ^ 

1 /■* 

= A V -- / dr exp [i (wjt + (/)t)] Nj 
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A^ ^ — 7 (exp [i {ojjt + (/)j)] - exp [i0j]) 
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X exp [i {ujkt + 0fc)] (Nj, Nfc)j^_i 

(exp [i (oiji + (f)j)] — exp [ii^j]) exp [i (ujjt + (pj)] 



icojt 



A" > — ; — (exp [iWjtJ — Ij exp \i (ojjt + (pj)\ 



As the trajectory evolves higher frequency components 
begin to decay as LJjt ^ 1, isolating the slowest oscilla- 
tion or at least a set of slow oscillations with degenerate 
frequencies^- In either case the criterion reduces to 



o = (p(i?0,p(i?0>M 



(exp [iojt] — 1) exp [iujt] 2^ exp [i2(/)j] . 



The first term in parentheses vanishes after every com- 
plete oscillation; the more interesting zero occurs when 



exp [iLut] y exp [i2(j)j] — 0. 



If the phases are coherent, (j)j — 0Vj, then the zero occurs 
at the nearest turning point. 



T 

t= — 
2 



-^) 



n G Z, 



but if the phases are incoherent then the total momentum 
is constant and there are no unique turning points along 



Note that the Nj are orthogonal with respect to the metric M ^ . 



^ In practice, the transient behavior can lead to premature satis- 
faction of the criterion; care must be taken when trajectories are 
terminated quickly. 



the trajectory. In this case the sum tends towards unity 
leaving the zero at 



t 
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which maximizes the distance by integrating to the op- 
posite point of the orbit. 

In the more general case the phase offsets lead to beat- 
ing, and the criterion vanishes at the average of the turn- 
ing points, 



T 
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For simple harmonic motion the NUTS criterion iden- 
tifies the first TPOLO along the trajectory, and hence 
the optimal integration time, for any EMHMC system 
(Figure H [3D. 

Although the NUTS criterion performs well in poten- 
tials that induce more complicated harmonic motion, it 
eventually begins to fail when applied to wells with more 
intricate geometry. The potential from the banana or 
twisted Gaussian distribution [dl. 



ViR) 



ql{R) , (g2(i?)+/39?(i?)- 100/3)' 



(4) 



with j3 — 0.03, cTi = 0.01, and (72 ~ 1, for example, 
yields a non-convex well on which NUTS prematurely 
terminates (Figure H]). 

Riemannian Manifiold Hamiltonian Monte Carlo 
(RMHMC) 7] 



^(^) = ^Ep'(^)p^-(^)A''(^)-Ji°g|^(^)i 



V{R), 



jk 



endows the base manifold with a Riemannian geometry 
that can simplify the motion in such complex potentials, 
but here the distance r has no real meaning. Although 
the trajectories are much smoother in RMHMC than 
EMHMC, the now ill-defined NUTS criterion still ter- 
minates prematurely (Figure [5]). 



IDENTIFYING TURNING POINTS ON 
RIEMANNIAN MANIFOLDS 

Although the motivating construction of the NUTS cri- 
terion required a measure of distance on the manifold, the 
intermediate form derived above does not. In particular. 



with 



(p(i?0,p(i?t))A(fl,)<0 



1 /■* 

p{Rt) = - / drpiRr), 
'- Jo 



is entirely well-defined on a Riemannian manifold, pro- 
vided that we can add momenta from different points 
along the trajectory. Fortunately, the symplectic geome- 
try of T*Jv[ furnishes such a means. 

The canonical one-form, 6 G T* (T*Ai) [J|, given in 
coordinates as 



e = J2Pjdq^, 



is a natural object on the cotangent bundle. Note that 9 
has no dpj components in any coordinate system — it is 
a horiztonal form on T*M. 

Now the canonical one-form can be Lie dragged along 
the Hamiltonian flow [91, 



e+ f drCgO, 
Jo 



where the components of the Lie derivative, Cg9, are 
given by 
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Dragging from the beginning of a trajectory to the end 
then yieldcl 



eu{St)^Y. 



..(.0+/ d.f 



M'{So) 



= [p,{St)+pj{s^)-p,{St)W{St) 

^Pj{So)dq^{St). 

Because is a horizontal form, 01^ defines a unique cov- 
ector p*{Rt), with components 

p*iRt)^{9;{St))^ 

^PjiRo), 



^ Lie dragging is canonically defined as against the flow of the 
vector field, from the end of a trajectory to the beginning. Here 
we're dragging with the flow, hence the the minus sign. 
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FIG. 2. When applied to the quadratic potential (|3]) with M = I (Figure [T^), the NUTS criterion terminates the trajectory at 
the first TPOLO, as desired. 
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FIG. 3. The NUTS criterion successfully identifies the first TPOLO even for more sophisticated choices of the mass matrix, 
such as the quadratic potential Q with M — S~^ (FigurefTJs). 
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FIG. 4. On more complicated potentials, such as that arising from the banana distribution Q, the NUTS criterion terminates 
prematurely even when ignoring the transient behavior. 



Banana (RMHMC w/ SoftAbs Metric) 
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FIG. 5. RMHMC trajectories, here using the SoftAbs metric [3] with a = 1, yield much smoother trajectories in the banana 
potential Q, but the ill-defined NUTS criterion still terminates prematurely. 



on the base manifold at Rt = 7r(5't) € Ai. This then 
defines a mapping of the momenta at any point Rq along 
a trajectory to the Rq (Figure E]). 

We can now define p{Rt) for any RMHMC system as 

p{Rt) = \ I drp*(i?0, 
t Jo 

yielding the generalized NUTS criterion, 

amenable to HMC on Riemannian manifolds. 

In practice, the integral can be approximated by ag- 
gregating the momenta along discrete time intervals, 
ik — tk-i = Stk, 



'^EjrX^*^^-^' 



such as those naturally introduced in any discrete inte- 
grator. 

Rigorously defined, this generalized criterion identifies 
the TPOLO even in warped distributions such as the 
banana (Figure [7]). 



CONCLUSION 

By taking advantage of both symplectic and Rieman- 
nian geometry, NUTS is readily generalized to RMHMC. 
When applied to bound trajectories, this criterion iden- 
tifies the turning points of the longest oscillation and 
terminates the motion before it begins to double back on 
itself and waste computation. 

This generalization is readily adapted into the NUTS 
framework and is currently being in implemented in 
Stan fial. 
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FIG. 6. The Hamiltonian flow allows us to Lie drag the canonical one form, 9, from any point So to some final point St along 
the trajectory. This defines a unique momentum on the base manifold, M, providing a map of the momentum at _Ro = 7r(5'o) 
to Rt = IT {St), where it can be compared with the current momentum. 
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FIG. 7. Only with a well-defined criterion ([5]) do RMHMC trajectories, here using the SoftAbs metric [g] with a = 1, terminate 
at the TPOLO of the banana distribution. Note that the transient behavior persists, and care must still be taken at the 
beginning of a trajectory. 



